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Abstract —Internet and online-based social systems are rising 
as the dominant mode of communication in society. However, 
the public or semi-private environment under which most online 
communications operate under do not make them suitable 
channels for speaking with others about personal or emotional 
problems. This has led to the emergence of online platforms 
for emotional support offering free, anonymous, and confidential 
conversations with live listeners. Yet very little is known about 
the way these platforms are utilized, and if their features and 
design foster strong user engagement. This paper explores the 
utilization and the interaction features of hundreds of thousands 
of users on 7 Cups of Tea, a leading online platform offering 
online emotional support. It dissects the user’s activity levels, the 
patterns by which they engage in conversation with each other, 
and uses machine learning methods to find factors promoting 
engagement. The study may be the first to measure activities 
and interactions in a large-scale online social system that fosters 
peer-to-peer emotional support. 

I. Introduction and Motivation 
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Eig. 1: Browsing for new listeners in 7cot. (1) Listener profiles; 
(2) Searching for listener by various criteria; (3) Connecting 
with a listener immediately. 


Internet and online-based social platforms encompassing 
online social networks such as Eacebook, Linkedin, and 
Twitter and messaging services like Snapchat and Kik are 
rising as the dominant way people in society communicate 
with each other. On these platforms, users are surrounded by 
Triends’ or ‘colleagues’ who may happy to help a person 
presently going through a period of emotional distress. Yet 
the public or semi-public nature of these platforms as well 
as the permanency of their communication records mean they 
are less than ideal mediums to seek and receive emotional 
support. There is therefore a need for online social systems 
that offer private, anonymous, quick, and live emotional sup¬ 
port for those who prefer to communicate online and need 
immediate help (ll, O. Existing systems for this purpose 
vary in regards to the type of support offered, from generic 
advice for common emotional condition^ to offering self- 
diagnosis for a condition O. Some systems also offer access 
to a live therapist when a user is suffering from a specific 
condition, such as suicide contemplatiorR or after receiving 
a critical health prognose^ (H, O, 0. Past studies of 
online systems connecting users to a live listener confirm their 
effectiveness Q, however they are limited to only helping 
those that suffer from a particular ailment. 

^ http: //w w w. stres s. org/emotional- and- social- support 
^ http://www.crisischat.org http://www.befrienders.org 
http://www.cancersupportcommunity.org 


In order to provide a safe, anonymous space for users to 
find emotional support for problems of any size, the online 
social system 7 Cups of Te^ (7cot) was developed. As seen 
in Eigure 7cot fosters an active community or crowd of 
“listeners” who are individuals trained to support people facing 
a wide range of emotional problems. People needing emotional 
support may use the service to immediately and privately 
engage in one-on-one conversations with listeners or connect 
to themed group chat rooms. In less than two years, 7cot has 
attracted a community of hundreds of thousands of members 
and fostered millions of one-on-one conversations. Its rapid 
growth suggests a significant demand for creating online 
spaces where users can find and offer emotional support. 

Beyond our knowledge that therapeutic support can be 
effectively delivered online (Tl, we know very little about 
how online emotional support platforms are utilized by users, 
the mechanisms with which users connect to listeners, and 
the design choices that encourage long-term user engagement. 
This paper therefore studies the utilization, interactions, and 
engagement of users on 7cot. It specifically explores: (i) the 
degree to which activities are performed by different types 
of users; (ii) the interaction structure of member-to-listener 
conversations and the relationships among members (listeners) 
connecting with common listeners (members); and (iii) models 

^ http://www.7cupsoftea.com 








'i' Embraced 4 ® 


S Support Forum ©Progress ^Announcements© OSettings 



Fig. 2: 7cot member interface. (1) List of current conversa¬ 
tions; (2) selections to connect to any listener; (3) conversation 
window; (4) progress values; (5) access chat forums and 
progression metrics. 


that identify the user and platform features encouraging long¬ 
term user engagement. The findings are connected to useful 
insights on how to improve existing platforms, to create 
effective new ones, and to better understand how the Internet 
is currently used as a ‘crowdsourced emotional support’ tool. 

The layout of this paper is as follows: Section [n| gives a 
broad overview of the 7cot platform. Section explores the 
activity of different types of users on 7cot. Section [rv| studies 
the structure of interactions (conversations) between members 
and listeners. The factors that drive member engagement and 
model that predicts long-term engagement are presented in 
Section |V| A summary of our findings and concluding remarks 


are given in Section VI 


II. Overview of 7 Cups of Tea 


7cot launched on December 5th 2013. The service is used by 
three types of users: members choosing to register an account 
in order to speak with someone, listeners who register to listen 
to the problems of others and are required to take an online 
training class, and guests who choose not to register but still 
wish to converse with listeners. Users may take on multiple 
types; for example a member that passes the required training 
class may become a ‘hybrid’ who is also a listener. Table |l| 
lists that, as of November 18 2014, the site is populated by 
87,232 members, 33,601 listeners, and 12,038 hybrid users. 
The members and listeners identify themselves as either a 
teenager or an adult to connect with an appropriate listener. 
Once logged in, self help guides are available for users wanting 
to self-diagnose or support themselves. 

Users communicate with others in three different channels: 
group chats, conversations, or forums. Group chats are free 
exchanges that multiple guests, members, and listeners may 
participate in. Conversations are private exchanges of mes¬ 
sages between members or guests and listeners. A conversation 
is a single, permanent connection lasting for an indefinite 
amount of time. Members and guests are able to start a 
conversation with any listener that is currently online, or may 
search for a listener satisfying some criteria. Members search 
for listeners through the interface in Figure It offers a profile 


of the listeners matching the criteria entered in the top bar, and 
another option to immediately connect with any active listener 
should the member be in crisis. The interface members use to 
access various communication channels is shown in Figure 
The left panel shows all conversations the user participates in 
and gives options to create new conversations, the right panel 
is an active conversation, the top right status bar are values 
related to members’ emotional progress, and the menu options 
lead to the forum and member profile. 

“Gaming” or “progress” mechanisms are integrated into 
the site to represent user reputation and experience. Listeners 
gradually accrue ‘cheers’ over time, and after attaining certain 
amounts their ‘listener level’ is upgraded to a more prestigious 
category. Listeners also achieve ‘badges’ displayed on their 
profile for accomplishing tasks like helping members facing 
a specific type of need (e.g., loss of a loved one). Members 
accrue ‘growth points’ for performing simple activities such 
as posting on the forum, or sending messages during a 
conversation. Accruing enough ‘growth points’ will upgrade 
their ‘member level’, a rank that reflects a commitment to the 
site and progress toward improved mental health. 

7cot shared a database capturing the attributes of all users, 
interactions, and activities performed since its inception on 
December 5th, 2013 through November 18th, 2014. The 
database includes metadata about every user except for those 
attributes related to the user’s true identity and contact infor¬ 
mation. Attributes of each conversation record were limited to 
participant identifiers, the date the conversation commenced, 
the number of messages exchanged by each party, whether 
the conversation was for a teenager or adult member, if the 
conversation was terminated by the member or listener, and 
the timestamp of the last message sent. User behaviors on 
the site were captured between May 7th and November 18th. 
For privacy reasons, the only actions captured are the number 
of messages sent, requests made, forum posts made, logins, 
forum views, help guide views, and page views through the 
mobile app or Web browser per user per day. 

III. Platform Activity 

Table U is divided into three sections that summarize the 
participation, actions, and conversations held. The participa¬ 
tion statistics in section (a) underscore the size and volume 
of activity on 7cot. In an 11 month span, over L27M conver¬ 
sations were held between 87,232 members seeking help and 
33,601 listeners. In addition, 12,038 or 10.0% of all users are 
hybrids (both a member and a listener). The rate at which con¬ 
versations are initiated rose at an exponential pace over 7cot’s 
first 9 months as shown in Figure (note that conversations 
initiated in November 2014 only refer to approximately two 
weeks). We also explore the temporal patterns of conversations 
during the week of August 10, 2014 in Figure The labels 
on the X axis are centered to 12pm. The Figure shows a diurnal 
pattern with larger volumes of conversations commencing in 
the middle weekdays. Furthermore, most conversations are 
initiated in the morning and overnight hours, with a lull in 
activity in the midday. These patterns suggest that members 
























Participation (a) Actions (Avg. per user per active day) (b) Conversations (c) 


Period 

Dec 5 - Nov 18 

Period 

May 7 - Nov 18 

Period 

Dec 5 - Nov 18 

Num. Conversations 

1.27M 

Logins 

2.41 

Volume (by Users): 

413,256 (adult); 131,449 (teenager) 

Distinct Forums 

53 

Conversation Messages 

62.28 

Volume (by Guests): 

493,365 (adult); 229,918 (teenager) 

Chatroom Messages 

1.07M 

Conversation Requests 

1.83 

Type: General 

522,863 (adult); 224,939 (teenager) 

Forum Posts 

82,223 

Forum Posts 

2.93 

Type: Personal 

383,758 (adult); 136,428 (teenager) 

Num. Members 

87,232 

Forum Post Views 

6.38 

Messages (by Non-Listeners) 

14.77M (adult); 4.28M (teenager) 

Num. Listeners 

33,601 

Page Views 

15.98 

Messages (by Listeners) 

13.54M (adult); 4.12M (teenager) 

Num. Hybrid 

12,038 

Help Guide Views 

4.12 

Terminations 

61,435 (members); 196 (listeners) 


TABLE I: Summary of 7cot participation, actions, and conversations 




(a) Conversations per month (b) Conversations per day of week 

Fig. 3: Conversation rates on 7 Cups of Tea 

have a preference to share information during the evening or 
even overnight hours. 

Section (b) of Table |l| summarizes the rate of actions under¬ 
taken by users per day, not counting the days when a user does 
not perform the action. For example, users log-in an average 
of 2.41 times across all days they have logged in at least 
once. Furthermore, members connect to an average of 1.83 
listeners per day they decide to connect to a new listener, and 
submits an average of 62 messages. These statistics indicate 
that members are not hesitant to reach out with many other 
listeners multiple times per day. In fact, the platform’s ability 
to let a member communicate with many others, rather than 
a single professional, is a key differentiator between seeking 
online and offline help. For example, 7cot members may 
listen to the thoughts and perspectives of a large number of 
others, searching for resolution by considering the viewpoints 
of many others. Section (b) also shows that forum participation 
and seeking self-guided help are less popular compared to 
participating in one-on-one conversations. 

Section (c) of Table [T| lists summary information about 
conversations, broken down by whether a participant is a 
teenager or an adult. Of the 1.27M conversations, more than 
half are initiated by guests. This reflects the demand for 
platforms to let people connect and speak with others immedi¬ 
ately, without going through an extensive registration process 
beforehand. It also demonstrates an untapped opportunity for 
a platform to transform guests who had positive experiences 
into members or listeners who can further build its community. 
Section (c) also gives the breakdown of conversations that 
are “general” or “personal”. “General” conversations are ones 
where a member asks the platform to connect to any listener, 
whereas “personal” conversations have a member asking a 
specific listener to talk to. No matter the type, over 28.5% of all 
conversations involve teenagers, supporting the hypothesis that 
young generations find online platforms to be a desirable way 
to express their problems and find support. Users also tend to 


initiate conversations without regard for whom the listener is, 
with far more “general” than “personal” conversations. People 
seeking emotional support from a crowd may therefore be less 
interested in the kind or expertise of a listener. It could also 
be a reflection of 7cot’s design, which lets members connect 
to any listener quickly across many member interfaces. The 
section also shows that approximately 61,631 or 4.9% of all 
conversations are ‘canceled’ by a user. Canceled conversations 
are ones where a participant decides to permanently terminate 
a conversation. The relatively small percentage indicates that 
users sharing offensive, derogatory, or other messages that 
would lead to conversation termination happens infrequently. 
Users are therefore mostly civil and supportive to each other. 
Conversations are more often terminated by members, possibly 
if they disagree with the listeners suggestions or have found 
the conversation unable to solve their emotional problem. 

IV. Interaction Structure 

We next study the patterns of member engagements with 
listeners on 7cot. The patterns are found through analysis of 
a network where members and listeners are connected if they 
held at least one conversation with each other. We also study 
networks that connect members (listeners) to each other if 
they had a conversation with at least one common listener 
(member). Structural analyses of the networks inform how 
members are choosing to engage with listeners on 7cot, if 
some subsets of listeners are more popular than others, and 
if a pattern of members selectively choosing listeners can be 
seen. 

We represent all 7cot interactions as a bipartite network 
from members to listeners. We consider all 465,437 conver¬ 
sations that contained at least one message sent by either a 
member or listener (note that guests are excluded from this 
analysis and will be the subject of future work). Table [n| lists 
the structural features of this bipartite network. The network 
has an average degree (k) of 5.39, i.e. members tend to 
connect to between five or six distinct listeners during their 
time on the service. This reaffirms the idea that members seek 
help from a number of others, perhaps to obtain different 
viewpoints or thoughts about their emotional problem. We 
also computed the number of connected components in the 
network. Only 477 disconnected components exist, the largest 
of which (GCC) includes virtually every user (99.2%) on the 
platform. In other words, there are virtually no members or 
listeners on 7cot who choose to exclusively search for and 
communicate only with each other. The single large GCC 
lets us compute the average path length in the network as 



























Bipartite Network 

Member Proj. 

Listener Proj. 

VI 

117,372 

86,877 

30,495 

1^1 

465,437 

12,657,611 

10,359,604 

(k) 

5.39 

291.39 

679.43 

C 

N/A 

0.734 

0.636 

A 

N/A 

-0.10 

-0.06 

d 

3.46 

2.56 

2.30 

P 

N/A 

0.003 

0.022 

Components 

447 

447 

447 

GCC Size 

116,411 (99.2%) 

86,364 (99.4%) 

30,047 (98.5%) 


TABLE II: Bipartite and projection network features 


(a) Member network sample (b) Listener network sample 

Fig. 4: Edge sampled projection networks with nodes colored 
by clustering coefficient 

d = logdyi/^i)/log(z 2 /^i) + 1, an expression valid for 
networks that are nearly fully connected ||8l, where zi and 
Z 2 are the average number of others a user can reach within 
one and two hops respectively. The small average path length 
J = 3.46 may be indicative of the existence of a large ‘core’ of 
members and listeners serving as hubs that connect members 
and listeners to others across the bipartite structure. Listeners 
in the ‘core’ may thus connect to large and diverse sets of 
members, i.e., are the listeners that connect to members who 
request to speak with any available listener. 

We omit measuring the clustering coefficient C, degree 
assortativity A, and density p of bipartite network because 
their definitions are closely related to measurements taken over 
the network’s one-mode projections a. One-mode projections 
capture the structure of interaction co-occurrences among the 
g listeners and n members of 7cot. Given a matrix B G 
where = 1 if listener i has a conversation with member j, 
we define = B^B G and = BB^ G as 

the adjacency matrices of the member and listener projection 
networks, respectively. We then have P-J^^ = c (Pfj = c) 
if members (listeners) i and j hold a conversation with c 
common listeners (members). Structural patterns within the 
projection networks are discussed next. 

A. Connectivity patterns 

Table [I^ gives the mean degree, global clustering coefficient, 
degree assortativity, average path length, density, and GCC 
size of the member and listener projection networks. These 
statistics may be compared with a visualization of a random 




(a) Member projection (b) Listener projection 

Fig. 5: Projection network degree distributions 

sampling do) of 10,000 edges of the projection networks in 
Figure Nodes are colored hotter in the figure if they have a 
higher local clustering coefficient Ci (green nodes have C/ = 0 
and red nodes have Ci = 1) and are drawn under a force 
directed layout so that nodes separated by small distances are 
positioned closer together. Although sophisticated sampling 
algorithms are needed to create samples that maintain many 
structural features of the sampled network ifTTl . edge sampling 
still conveys the shape of the global network within the 
interconnected core of the sample (nodes participating in ex¬ 
cessive numbers of open triangles are likely an artifact of edge 
sampling). The high mean degree, large GCC size, and small 
average path lengths of both projections further support the 
hypothesis that members and listeners do not limit themselves 
to interact with a small subset of listeners (members). They 
both exhibit weak negative degree assortativity, suggesting 
a small inclination for members (listeners) who share just 
a few common listeners (members) with others share them 
with those who have large numbers of listeners (members) 
in common with others. However, the lower degree, larger 
clustering coefficient, and larger path lengths of the member 
network imply a weak penchant for members to form clusters 
by the common listeners they connect to. Such clusters can 
be seen in Figure as cliques in the core of the member 
network. These clusters may be traces of member groups that 
connect to similar ‘types’ of listeners. 

We find the degree distributions of the projection networks, 
presented in log-log scale in Figure to take dissimilar 
shapes. The listener degree distribution exhibits a near straight 
line pattern indicative of a power-law distribution, but the 
pattern is less pronounced in the member degree distribution. 
We quantify this difference by running a maximum likelihood 
based test of the null hypothesis Hq: the empirical data has 
a power-tailed distribution (the test also yields best fitting 
power-law exponent a under the null) Cal. The test leaves 
little room to reject Hq for the listener degree distribution 
(p = 0.985; a = 2.51), but there is more doubt for the 
member degree distribution (p = 0.362; a = 2.34). That the 
listener degree distribution has a power-tail suggests significant 
variation in the number of common members listeners share 
with each other, and that the probability of sharing orders 
of magnitude more members than expected is not negligible. 
A similar statement could be made about members, however 
they may exhibit less variation since we are less confident if a 
power-tailed trend exists. The difference of the distributions 















shape may be explained by members who only need to 
connect to a limited number of listeners in order to have many 
problems resolved, or by members who choose to connect 
deeply with a small number of listeners. Such behaviors place 
a ‘soft limit’ on the largest number of listeners members 
may connect to, weakening the support for a power-tail to 
emerge Ha. On the other hand, so long as a listener is 
available for newly added members to connect to, there may 
be no limit on the number of new members a listener may 
connect to over time. 

B. Centrality analysis 

We also study connectivity-based notions of network cen¬ 
trality in the projection networks. We first consider the 
betweenness centrality of a user u, defined as b{u) = 
I where cFij is the number of shortest paths 
from users i to j and (Jij{u) is the number of such paths 
that include u. This measure refiects the notion that a user 
is ‘central’ if she is often part of the shortest path among 
two others in the network. Figure plots the cumulative 
distribution (CDF) of the centrality scores across the two 
networks on semi-log scale. Its rapid ascent and long left tail 
indicate that almost all users are part of a number of shortest 
paths in the network. The networks are therefore structurally 
robust to the loss of users. We also consider the closeness 
centrality of a user u, defined as c{u) = d{u, j))~^ where 
d{u^j) is the distance from user u to j. Figure gives the 
CDF of closeness centrality on the two networks (note that 
the x-axis is not in log scale). That the CDF for the listener 
distribution is stretched farther than the member distribution 
is only because there are fewer nodes in the network. Unlike 
betweenness centrality, the closeness centrality CDF of the two 
networks takes on different shapes. The CDF of the member 
network has only a slight curvature at its left and right tail, 
with a nearly linear body. This suggests that the centrality 
scores exhibit a small peak around the mean of the distribution 
but are otherwise uniformly distributed. The centrality scores 
of listeners are uniformly distributed up to approximately the 
40^^ percentile, at which point they become heavily skewed. A 
majority of listeners, therefore, are at a much shorter distance 
from those below this 40^^ percentile. This pattern may be 
indicative of a core-periphery structure ifT^ in the listener 
projection network that does not exist in the member one, 
where those in the core (periphery) have high (low) closeness 
centrality. The probability of a listener falling in the core 
may be correlated with the diversity of the members she 
connects to: connecting to many different members increases 
the probability of sharing a connection with a listener already 
in the core. 

C. Network transitivity 

Finally, we use the local clustering coefficient distributions 
of the projection networks to study the tendency of transi¬ 
tive relationships among members and listeners. A transitive 
relationship is one where if user A is a member (listener) 
connected to user B and B is connected to C, then A is 




(a) (b) 

Fig. 6: Centrality distributions for members and listeners 
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(a) Member projection (b) Listener projection 

Fig. 7: Projection network cluster coefficient distributions 

connected to C. Table lists the global clustering coefficient, 
defined as the average of the number of closed triangles in 
a user’s neighborhood divided by the number of possible 
links that could exist within it ifTSll . as C = 0.734 and 
0.636 for the member and listener projections respectively. The 
large coefficients signify that transitive relationships dominate 
the projection networks. However, histograms of the local 
clustering coefficients Ci in the member and listener network 
in Figure [7] show that the large values are driven by the 
38.9% of members and 13.2% of listeners whose C/ = 1. The 
high values of C are therefore driven by a small proportion 
of users with fully connected neighbors. When we consider 
users whose Ci <1, closeness centralities appear to be 
normally distributed. Normally distributed Ci distributions is a 
typical phenomenon in co-occurrence networks spanning many 
systems, including scientific paper authorship CS), oil, e- 
commerce co-purchases DU, and “related page” relationships 
on search engines 03 , but the surge of members where 
Cl = 1 is unique to 7cot interactions. This suggests that users 
with Cl = 1 may not emerge from some natural or universal 
process innate to all co-occurrence networks. This is evidence 
that both members and listeners perform deliberate actions 
that drive them into fully connected neighborhoods in the 
projection networks. For example, members may be selectively 
connecting to the same pool of listeners that may have similar 
ratings, experiences, or bio’s suggesting an expertise that 
members in their neighborhood do. Finally, it is interesting to 
note that the proportion of members where Ci = 1 (38.9%) is 
very similar to the proportion of personal conversations (where 
a member chooses a listener to connect to) in Table |T| (39.8%). 

V. Understanding User Engagement 

Next, we perform an engagement analysis of members on 
7cot. Engagement analysis offers insights about the user and 
platform features that encourage members to return, listeners 














Coins 

0.247 

Growth Points 

0.977 

Compassion Hearts 

0.243 

Signup Date 

-0.009 

Last Login Date 

0.133 

Distress Level 

0.004 

Group Chat Msgs 

0.120 

Page Views (Web) 

-0.002 

Page View (iOS) 

-0.001 

Login Count 

-0.001 

Conv. Requests 

0.001 

Self Help Views 

0.005 

Forum Posts 

-0.001 

Forum Views 

-0.001 

Forum Up-votes 

0.201 




TABLE III: Pearson correlation between message rate and user 
or behavior features 


to stay active, and encourage members to have multiple, fruit¬ 
ful conversations. Such insights are practically important to 
help a platform retain new members and grow its community 
of listeners. They also identify qualities that encourage people 
to seek follow-up emotional support. Due to space limitations, 
we will consider listener engagement in future work. 


A. Factors driving engagement 

We first relate the features and behaviors of members and 
their relationship to a measure of site engagement. Since 
sharing with listeners is the purpose of the service, we quantify 
engagement as the message rate of a member, that is, the 
average number of messages sent per day in conversations. 
We consider features and behaviors that, based on discussions 
with psychologists and designers at 7cot, may be related to 
engagement: (i) number of coins, growth points, and compas¬ 
sion hearts, which are gaming and progress measures related 
to a members reputation and experience; (ii) signup and last 
login date; (iii) reported distress level when members regis¬ 
ter; (iv) number of group chat messages; (v) number of page 
views from the 7cot Web and iOS applications; (vi) number 
of logins; (vii) number of conversation requests sent; (viii) 
number of self help page views; (ix) number of forum posts. 


views, and up-votes. Table [IIJ gives the Pearson correlation 
coefficient between the features and a members’ message rates. 
The coefficients make clear that the gamification features of 
the platform (accumulated coins, hearts, and growth points) 
are strongly related to the engagement of a member. However, 
conversation messages sent by members directly increase 
growth points, giving this correlation little meaning. Member 
attributes and behaviors unrelated to communication (signup 
and last login date, distress level, page views, and help article 
views) exhibit virtually no correlation, suggesting that users 
dealing with any type and degree of emotional distress, at any 
time, exhibit similar levels of engagement on the site. 

Many features exhibit little correlation with user engage¬ 
ment, but interaction terms built by subsets of them may be 
positively correlated. For example, users who exhibit a high 
distress level and submit many conversation requests may 
have a high level of engagement even though the features are 
individually not correlated. Instead of exhaustively exploring 
all multi-way interactions, we consider a random forest model 
that predicts user engagement by a regression over all features. 
A random forest is an ensemble of decision trees, each of 
which is trained over different bootstraps of the data. During 
training, each tree is limited to the use of distinct small 
subsets of the features to make splitting decisions. If Xu is 



Fig. 8: Random forest predicted vs. actual engagement 


a vector of member u's features, the random forest predicts 
the engagement of u as f{Xu) = N~^ where 

fi is the predicted engagement value from the of N 
decision trees in the random forest. The bootstraps, limited 
choice of features for tree splitting, and averaging of results 
across the tree ensemble ensure the forest does not overfit 
the data even for large N 1^ . We compute the importance 
of each feature to the random forest regression model as 
follows: let C = ~ t)e the mean square 

error (MSE) of the random forest predictions against the 
actual engagement yi of every member i. The importance of 
feature i may be found by randomly perturbing the values of 
i across every member’s feature vector. Letting X^^^ be the 
feature vector of member i whose element is perturbed and 
~ the MSE of the model using 

the perturbed vectors, the importance of £ may ranked by the 
percent increase in MSE between C and Ci. For example, if 
feature £ is not important, the errors of the model will be less 
sensitive to a reshuffling of its values across all users. 

We trained a random forest using 75% of the user data for 
a forest with N = 1000 trees and randomly choose 1/3 of the 
features for every tree splitting decision. Figure gives the 
quantile and prediction scatter plots of the predicted and actual 
message rates for the 25% of users not used to train the random 
forest. The figure demonstrates that the decision tree models 
engagement very well (i?^ > 89%), as the quantile plot shows 
a linear relationship between the distribution of the predicted 
and actual engagement rates up to the 60^^ quantile. The 
predicted vs. actual engagement rates in Figure [8b| only show 
normally distributed errors for users with low engagement. 

Since the random forest reasonably models the relationship 
between member and behavioral features, we use it for feature 
importance analysis. Figure shows the percent increase 
in MSE of a random forest trained with data where each 
factor was individually perturbed across the training data. As 
anticipated by Table [II^ the total number of growth points 
of a member is the most important factor for predicting user 
engagement due to its direct correspondence with her message 
rate. Members’ signup and last login dates are the next most 
important features, each of which increases MSE by over 20% 
when they are perturbed. The signup date of a user is weakly 
anticorrelated with engagement according to Table |I^ thus 
recent logins have a weak relationship to engagement. The 
number of messages sent in group chat is the next non-gaming 
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Fig. 9: Feature importance in the random forest model 

related feature that is important for user engagement. This 
suggests that participating in group chats encourages users to 
become more engaged in their one-on-one conversations. It 
may be the case that users find group settings to be easier or 
less intimidating to participate in, and builds their confidence 
to have lengthy sessions with a listener. Finally, we note that 
the number of up-votes a member has on the forum actually 
introduces noise in the model, since perturbing this factor 
decreases the MSE of the random forest. One explanation 
may be that members who gain recognition for their forum 
posts may be disinclined to participate in conversations since 
they achieve recognition and perhaps satisfaction by only 
participating on the forum. 

B. New user engagement prediction 

New members to a service may be active for a brief 
period of time, then become ‘inactive’ and never return. 
Early identification of new users likely to become inactive 
helps a platform identify those who could be encouraged or 
incentivized to continue seeking help, or become listeners to 
bolster its community. Feature importance analysis of such 
an identifier may also reveal the behaviors and attributes that 
promote people to return and seek follow-up support. 

We consider a random forest classifier that identifies if 
a member, based on actions during her first two weeks on 
7cot, will become an active user. Since there is lack of a 
standard definition for an ‘active’ user of a Web service, we 
consulted with 7cot administrators to define an active user as 
one who: (i) has been registered for at least six weeks; and (ii) 
has performed at least two actions on the service over the past 
month. We also define a ‘new user’ as one who has registered 
within the last two weeks. We identified all members who 
registered between May 7th (the first date user action data was 
recorded) and November 18th, 2014 (the end of our data set) 
and mark them as ‘active’ or ‘inactive’. We then collected the 
following actions they performed during their first two weeks 
on the site: (i) number of conversation requests and messages 
sent; (ii) number of forum posts made and viewed; (iii) number 
of logins performed; (iv) number of help page views; and (v) 
number of site pages accessed via 7cot’s Website and iOS app. 

52,803 members registered on 7cot during the time period 
considered, of which 11,117 (21%) became active and 41,686 
(79%) became inactive. We created a training set by randomly 
sampling 66% of the registered members for a random forest 


classifier to predict if they are active. Trees are trained in 
a similar fashion to regression. Each tree yields its own 
prediction of if a member will be active or inactive given 
her actions during the first two weeks. A majority vote of 
the trees then decides the class to be predicted. Due to the 
imbalance in the number of inactive and active members in 
the training data, the minority class is randomly oversampled 
so that equal number of inactive and active cases are provided 
for training (201 . The trained random forest was tested over 
the 33% of users not considered in the training set. The 
classifier achieves a very promising accuracy of 92.5% and 
the ROC curve in Eigure |10a| demonstrates only a moderate 
false positive rate (ROC curves approaching the (0,1) comer 
of the plot are perfect classifiers; the ^ = x line represents a 
classifier that performs random guessing). 


As before, we assess the importance of the factors used for 
predicting active users. Since the concept of MSE is incom¬ 
patible with the notion of a binary classification decision, we 
instead consider the Gini index (201 of decision tree nodes 
in the forest. The Gini index of a decision tree node t is 
defined by Gt = pta(l - Pta ) + Pti(l - Pu ) where pta { pti ) 
is the proportion of members marked active (inactive) that 
fall into node t based on the splitting criteria of its parent 
node. A Gt close to zero suggests that the splitting rule at 
the parent divides the data into separate classes, which is 
a property of strong decision tree classifiers. We thus rank 
the importance of a factor by the average decrease of the 
Gini coefficient across all splits in all trees of the forest in 
Eigure 10b It reveals that the number of messages sent in 
conversations and conversation requests submitted within the 
first two weeks are the actions that best predict whether a 
user will become active. We further examine the interaction 
between these two features by showing the percentage of new 
users who became active and submitted < x messages in 
their first two weeks in Eigure [m Each trend corresponds 
to subsets of members who also submitted less than the 
specified number of conversation requests. The figure shows 
how for small numbers of conversation requests, the total 
number of messages sent in one-on-one conversations strongly 
infiuences members to become active. But once approximately 
five conversations are created, the number of messages sent in 
a conversation loses its importance. This may be because new 
users who connect with greater numbers of listeners feel more 
obligated to return to these connections again in the future. On 
the other hand, when a user connects to only a few listeners. 










Fig. 11: Active users retained by number of conversation 
requests and minimum number of messages sent 

a stronger bond between them (i.e., more messages shared) is 
necessary to drive the member to return to the site. 

Figure |10b| also shows that the number of account logins 
performed, the user’s distress level, and activity related to 
the online forums within a member’s first two weeks are not 
major predictors of her becoming active. The frequency with 
which a member accesses the platform is thus unrelated to 
whether she will become an active member; what matters is 
not the number of times a member visits, but the quality or 
productivity of those visits as measured by the number of 
messages sent and conversations requested. Furthermore, since 
members are equally likely to become active no matter their 
distress level, people suffering from both basic and complex 
problems may be equally willing to become active in online 
emotional support platforms. Finally, public spaces to post 
messages, such as forums, do not encourage new members 
to become active ones. This may be because forums serve as 
a less personal, more public medium of communication. 

VI. Conclusions and Future Work 

As society becomes more reliant on Internet-based commu¬ 
nication, online platforms that offer emotional support will 
only grow in importance. This paper presented a detailed 
analysis of user interactions on 7cot, which is the largest such 
platform available today. The analysis made important insights 
relevant to the understanding of an emotional support platform 
that could inform the design of future ones. It shows how 
users are respectful to each other and tend to use the platform 
during midweek evenings, and that the ability for users to 
access large numbers of listeners is important and useful. 
Structural analysis revealed a small tendency for members 
to connect with sets of listeners who have latent common 
attributes, and that the mechanisms letting members connect 
to any available listener may lead to a core-periphery structure 
of listener relationships. Engagement analysis emphasizes the 
importance of mechanisms to track user progress. In addition, 
less intimidating group chats may serve as a gateway to en¬ 
gaging listener conversations, and users facing both simple and 
complex problems are as likely to become active participants. 

Future work will characterize guest attributes and behaviors 
in more detail, and explore how users transition from being 
guests, to members or listeners, and to hybrid users. The 


structural analysis will be extended to study the nature of the 
cliques emerging in the projection networks. This direction 
could reveal popular types of listeners that members search for, 
and whether or not the ailments of members can be inferred 
based on clique memberships. We will also perform a thorough 
engagement analysis on listeners, to understand the mecha¬ 
nisms and platform designs that keep them active. Improving 
the false positive rate of the active user classifier, alternate 
quantifications of ‘engagement’, and alternative classifier types 
will also be explored to enhance the engagement analysis. 
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